174 research outputs found
Fusion of Multispectral Data Through Illumination-aware Deep Neural Networks for Pedestrian Detection
Multispectral pedestrian detection has received extensive attention in recent
years as a promising solution to facilitate robust human target detection for
around-the-clock applications (e.g. security surveillance and autonomous
driving). In this paper, we demonstrate illumination information encoded in
multispectral images can be utilized to significantly boost performance of
pedestrian detection. A novel illumination-aware weighting mechanism is present
to accurately depict illumination condition of a scene. Such illumination
information is incorporated into two-stream deep convolutional neural networks
to learn multispectral human-related features under different illumination
conditions (daytime and nighttime). Moreover, we utilized illumination
information together with multispectral data to generate more accurate semantic
segmentation which are used to boost pedestrian detection accuracy. Putting all
of the pieces together, we present a powerful framework for multispectral
pedestrian detection based on multi-task learning of illumination-aware
pedestrian detection and semantic segmentation. Our proposed method is trained
end-to-end using a well-designed multi-task loss function and outperforms
state-of-the-art approaches on KAIST multispectral pedestrian dataset
Box-level Segmentation Supervised Deep Neural Networks for Accurate and Real-time Multispectral Pedestrian Detection
Effective fusion of complementary information captured by multi-modal sensors
(visible and infrared cameras) enables robust pedestrian detection under
various surveillance situations (e.g. daytime and nighttime). In this paper, we
present a novel box-level segmentation supervised learning framework for
accurate and real-time multispectral pedestrian detection by incorporating
features extracted in visible and infrared channels. Specifically, our method
takes pairs of aligned visible and infrared images with easily obtained
bounding box annotations as input and estimates accurate prediction maps to
highlight the existence of pedestrians. It offers two major advantages over the
existing anchor box based multispectral detection methods. Firstly, it
overcomes the hyperparameter setting problem occurred during the training phase
of anchor box based detectors and can obtain more accurate detection results,
especially for small and occluded pedestrian instances. Secondly, it is capable
of generating accurate detection results using small-size input images, leading
to improvement of computational efficiency for real-time autonomous driving
applications. Experimental results on KAIST multispectral dataset show that our
proposed method outperforms state-of-the-art approaches in terms of both
accuracy and speed
Unsupervised Domain Adaptation for Multispectral Pedestrian Detection
Multimodal information (e.g., visible and thermal) can generate robust
pedestrian detections to facilitate around-the-clock computer vision
applications, such as autonomous driving and video surveillance. However, it
still remains a crucial challenge to train a reliable detector working well in
different multispectral pedestrian datasets without manual annotations. In this
paper, we propose a novel unsupervised domain adaptation framework for
multispectral pedestrian detection, by iteratively generating pseudo
annotations and updating the parameters of our designed multispectral
pedestrian detector on target domain. Pseudo annotations are generated using
the detector trained on source domain, and then updated by fixing the
parameters of detector and minimizing the cross entropy loss without
back-propagation. Training labels are generated using the pseudo annotations by
considering the characteristics of similarity and complementarity between
well-aligned visible and infrared image pairs. The parameters of detector are
updated using the generated labels by minimizing our defined multi-detection
loss function with back-propagation. The optimal parameters of detector can be
obtained after iteratively updating the pseudo annotations and parameters.
Experimental results show that our proposed unsupervised multimodal domain
adaptation method achieves significantly higher detection performance than the
approach without domain adaptation, and is competitive with the supervised
multispectral pedestrian detectors
Video Event Recognition and Anomaly Detection by Combining Gaussian Process and Hierarchical Dirichlet Process Models
In this paper, we present an unsupervised learning framework for analyzing
activities and interactions in surveillance videos. In our framework, three
levels of video events are connected by Hierarchical Dirichlet Process (HDP)
model: low-level visual features, simple atomic activities, and multi-agent
interactions. Atomic activities are represented as distribution of low-level
features, while complicated interactions are represented as distribution of
atomic activities. This learning process is unsupervised. Given a training
video sequence, low-level visual features are extracted based on optic flow and
then clustered into different atomic activities and video clips are clustered
into different interactions. The HDP model automatically decide the number of
clusters, i.e. the categories of atomic activities and interactions. Based on
the learned atomic activities and interactions, a training dataset is generated
to train the Gaussian Process (GP) classifier. Then the trained GP models work
in newly captured video to classify interactions and detect abnormal events in
real time. Furthermore, the temporal dependencies between video events learned
by HDP-Hidden Markov Models (HMM) are effectively integrated into GP classifier
to enhance the accuracy of the classification in newly captured videos. Our
framework couples the benefits of the generative model (HDP) with the
discriminant model (GP). We provide detailed experiments showing that our
framework enjoys favorable performance in video event classification in
real-time in a crowded traffic scene
Learning Inter- and Intra-frame Representations for Non-Lambertian Photometric Stereo
In this paper, we build a two-stage Convolutional Neural Network (CNN)
architecture to construct inter- and intra-frame representations based on an
arbitrary number of images captured under different light directions,
performing accurate normal estimation of non-Lambertian objects. We
experimentally investigate numerous network design alternatives for identifying
the optimal scheme to deploy inter-frame and intra-frame feature extraction
modules for the photometric stereo problem. Moreover, we propose to utilize the
easily obtained object mask for eliminating adverse interference from invalid
background regions in intra-frame spatial convolutions, thus effectively
improve the accuracy of normal estimation for surfaces made of dark materials
or with cast shadows. Experimental results demonstrate that proposed masked
two-stage photometric stereo CNN model (MT-PS-CNN) performs favorably against
state-of-the-art photometric stereo techniques in terms of both accuracy and
efficiency. In addition, the proposed method is capable of predicting accurate
and rich surface normal details for non-Lambertian objects of complex geometry
and performs stably given inputs captured in both sparse and dense lighting
distributions.Comment: 9 pages,8 figure
Transcriptome, microRNA, and degradome analyses of the gene expression of Paulownia with phytoplamsa
Primers of P. tomentosa miRNAs for qRT-PCR analysis. (DOCX 20.7 kb
- …